Random sampling from a search engine's index
نویسندگان
چکیده
منابع مشابه
Random Sampling from a Search Engine’s Corpus
We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search engine’s public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder suffers from a well-recorded bias: it favors long documents. I...
متن کاملRandom mappings designed for commercial search engines
We give a practical random mapping that takes any set of documents represented as vectors inEuclidean space and then maps them to a sparse subset of the Hamming cube while retaining ordering ofinter-vector inner products. Once represented in the sparse space, it is natural to index documents usingcommercial text-based search engines which are specialized to take advantage of thi...
متن کامل3D Inverted Index with Cache Sharing for Web Search Engines
Web search engines achieve efficient performance by partitioning and replicating the indexing data structure used to support query processing. Current practice simply partitions and replicates the text collection on the set of cluster processors and then constructs in each processor an index data structure. This paper proposes a different approach by constructing an index data structure that pr...
متن کاملTree Search Stabilization by Random Sampling
We discuss the variability in the performance of multiple runs of Mixed Integer Linear solvers, and we concentrate on the one deriving from the use of different optimal bases of the Linear Programming relaxations. We propose a new algorithm exploiting more than one of those bases and we show that different versions of the algorithm can be used to stabilize and improve the performance of the sol...
متن کاملRandom Search versus Genetic Programming as Engines for Collective Adaptation
We have integrated the distributed search of genetic programming (GP) based systems with collective memory to form a collective adaptation search method. Such a system signiicantly improves search as problem complexity is increased. Since the pure GP approach does not scale well with problem complexity, a natural question is which of the two components is actually contributing to the search pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the ACM
سال: 2008
ISSN: 0004-5411,1557-735X
DOI: 10.1145/1411509.1411514